fix(update_service): fixes TaskDefinition inactive exception by rokibulislaam · Pull Request #135 · GetSimpl/cloudlift

rokibulislaam · 2023-05-04T07:42:15Z

Added a new tag in task definition "task_definition_source" to identify if task definition was created with CloudFormation or with boto
Included a preflight step to verify whether the existing deployment is dirty, and if it is, whether an image with the tag 'master' is available on ECR.

More context:

Whenever we execute the create_service command, a CloudFormation template is generated. Within this template, in the ECS Task definition section, we set the default Docker image URL with the master tag if the current git status is dirty or if the work tree is not clean. Additionally, we keep the DesiredCount as zero since, at this stage, we don't have an image uploaded to ECR. If we were to proceed with deployment, it would fail. Consequently, at this point, CloudFormation creates a task definition with the revision Family:1.

On the other hand, when we execute the deploy_service command, we build a Docker image and upload it to ECR with a tag based on the current git status or a specific version provided using the --version option. If the work tree is dirty, we append the dirty tag to the Docker image and push it to ECR. Next, we manually create a new task definition using boto3, utilizing the Docker image URI we just deployed with the corresponding image tag. We then increase the desired count from 0 to 1, prompting ECS to start the task. At this stage, boto3 creates a new task definition Family:2, while also deactivating Family:1 using boto3.

What's the issue?

The problem arises when we execute the update_service command, as we rely on CloudFormation to make the necessary changes. We regenerate the CloudFormation template and request CloudFormation to create a changeset. CloudFormation compares the old and new templates, resulting in the creation of a changeset. Now, two scenarios can occur:

If we make changes to the cloudlift service configuration that affect the task definition, such as updating the memory_reservation which modifies memoryReservation in the ECS task definition. In this scenario, CloudFormation creates a new task definition with a new revision, Family:3. However, in the new task definition, it sets the new image URI with the master tag. The CloudFormation stack update is successful, but since we had a dirty worktree, we only have one image deployed on ECR with the dirty tag. As a result, ECS attempts to run the task but continually fails since the image with the master tag does not exist. This is where the preflight check becomes relevant.
If we make changes to the cloudlift configuration that do not affect the task definition, such as updating the ELB/ALB health check path with health_check_path, which alters the Load Balancer and target group. Once again, we create a changeset and execute it. However, CloudFormation attempts to reference the last task definition created by itself, which in our case is Family:1. Nonetheless, we manually deactivated this task definition using boto3. Consequently, the CloudFormation stack update fails with a client exception: TaskDefinition is inactive, Status Code: 400. To address this, we are introducing a new tag (task_definition_source) to the task definition to identify whether the previous task definition was created by CloudFormation or boto3. If it was created by CloudFormation, we do not deregister the task definition.

Point to remember is CloudFormation always attempts to reference the task definition it created, regardless of the version number. Even if you manually created a new task definition, such as Family:999, and updated the service to run from Family:999, CloudFormation would still try to reference the version it created during its last operation. Could this be a bug in CloudFormation or is it designed to behave so?

- Added a new tag in task definition "task_definition_source" to identify if task definition was created with cloudformation or with boto - Included a preflight step to verify whether the existing deployment is dirty, and if it is, whether an image with the tag 'master' is available on ECR.

cloudlift/deployment/service_creator.py

rokibulislaam · 2023-05-29T05:59:08Z

One crucial case it's not covering is, if someone tries to update only the properties which don't affect TD during update_service for example, updating only the ELB health check path. But this pre-flight check is not letting it go beyond if an image with master tag is not pushed to ECR (if the current deployment is dirty) irrespective of the changes like an example discussed earlier.

Need to cover this, what's your opinion @praveenraghav01 ?

praveenraghav01 · 2023-05-29T06:48:07Z

I will suggest that we only do this if the changes include creation of a new Service. For rest we can just ignore image check

Moved `update_service` preflight checks function to pre-flight.py

shoan · 2023-08-28T07:17:06Z

cloudlift/config/pre_flight.py

+        repo_name = service_name + '-repo'
+        res = ecr_client.batch_get_image(repositoryName=repo_name, imageIds=[{'imageTag': 'master'}])
+        if res['images'] == []:
+            raise UnrecoverableException("Current deployment is dirty. Please push an image tagged as 'master' to ECR.")


On further thought, we should just bail stating that the current deployment is dirty and that an update cannot be made.

shoan · 2023-09-25T10:45:43Z

cloudlift/deployment/service_creator.py

@@ -164,4 +169,4 @@ def _print_progress(self):
        if "FAIL" in final_status:
            log_err("Finished with status: %s" % (final_status))


Should emit a non-zero exit code

shoan · 2023-09-25T10:49:43Z

cloudlift/config/pre_flight.py

+        repo_name = service_name + '-repo'
+        res = ecr_client.batch_get_image(repositoryName=repo_name, imageIds=[{'imageTag': 'master'}])
+        if res['images'] == []:
+            raise UnrecoverableException("Current deployment is dirty. Please push an image tagged as 'master' to ECR.")


Need to add cloudlift specific command to push to ecr. This can be done with upload_to_ecr

praveenraghav01 self-requested a review May 4, 2023 08:32

praveenraghav01 suggested changes May 26, 2023

View reviewed changes

cloudlift/deployment/service_creator.py Outdated Show resolved Hide resolved

praveenraghav01 requested review from PronomitaDey and krishnanand275 May 26, 2023 10:47

rokibulislaam self-assigned this May 29, 2023

rokibulislaam added 2 commits May 29, 2023 17:03

refactor(update_service): move preflight checks to pre-flight.py

e545385

Moved `update_service` preflight checks function to pre-flight.py

fix(create_task_def/update_task_def): handle cases for empty tags

30f6a39

shoan suggested changes Sep 25, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(update_service): fixes TaskDefinition inactive exception#135

fix(update_service): fixes TaskDefinition inactive exception#135
rokibulislaam wants to merge 3 commits intoGetSimpl:masterfrom
rokibulislaam:fix/TaskDefinition-inactive-error

rokibulislaam commented May 4, 2023 •

edited

Loading

Uh oh!

Uh oh!

rokibulislaam commented May 29, 2023 •

edited

Loading

Uh oh!

praveenraghav01 commented May 29, 2023

Uh oh!

shoan Aug 28, 2023

Uh oh!

shoan Sep 25, 2023

Uh oh!

shoan Sep 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -164,4 +169,4 @@ def _print_progress(self):
		if "FAIL" in final_status:
		log_err("Finished with status: %s" % (final_status))

Conversation

rokibulislaam commented May 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

More context:

What's the issue?

Uh oh!

Uh oh!

rokibulislaam commented May 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

praveenraghav01 commented May 29, 2023

Uh oh!

shoan Aug 28, 2023

Choose a reason for hiding this comment

Uh oh!

shoan Sep 25, 2023

Choose a reason for hiding this comment

Uh oh!

shoan Sep 25, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rokibulislaam commented May 4, 2023 •

edited

Loading

rokibulislaam commented May 29, 2023 •

edited

Loading